home *** CD-ROM | disk | FTP | other *** search
- Path: news.clark.net!not-for-mail
- From: gusty@clark.net (Harlan Messinger)
- Newsgroups: comp.lang.c++
- Subject: Re: Help with float/double data types
- Date: 31 Jan 1996 21:08:57 GMT
- Organization: Clark Internet Services, Inc., Ellicott City, MD USA
- Message-ID: <4eolp9$726@clarknet.clark.net>
- References: <96030.105940RWL380B@MAINE.MAINE.EDU>
- NNTP-Posting-Host: explorer.clark.net
- Mime-Version: 1.0
- Content-Type: TEXT/PLAIN; charset=ISO-8859-1
- Content-Transfer-Encoding: 8bit
- X-Newsreader: TIN [UNIX 1.3 950726BETA PL0]
-
- RWL380B@MAINE.MAINE.EDU wrote:
- : C++ Experts,
- :
- :
- : If I declare X, Y, Z and Zstd as float:
- : float X=0.0,Y=0.0,Z=0.0,Zstd=0.0;
- : Then the output looks correct but the digits after the decimal points
- : in X are incorrect (original first line X = -2082085.85000). The
- : first value of Y is incorrect, but the next values are correct!
- :
- :
- : X-Coordinate Y-Coordinate Z-Est Z StdDev n
- : -2082085.87500 -2127847.00000 23.12121 1.77931 8
- : -2082085.87500 -2047847.12500 23.13800 1.73910 8
- : -2082085.87500 -1967847.12500 23.15039 1.69751 8
- : -2082085.87500 -1887847.12500 23.15789 1.65532 8
- : -2082085.87500 -1807847.12500 23.15990 1.61336 8
- : -2082085.87500 -1727847.12500 32.19845 1.55000 8
- : -2082085.87500 -1647847.12500 38.46202 1.49013 8
- : -2082085.87500 -1567847.12500 32.17017 1.45502 8
- : -2082085.87500 -1487847.12500 39.51803 1.40543 8
- : -2082085.87500 -1407847.12500 47.27689 1.37395 8
- :
- :
- : If I declare X,Y,Z and Zstd to be double then the output is
- : correct.
- :
- : X-Coordinate Y-Coordinate Z-Est Z StdDev n
- : -2082085.85000 -2127847.12500 23.12121 1.77931 8
- : -2082085.85000 -2047847.12500 23.13800 1.73910 8
- : -2082085.85000 -1967847.12500 23.15039 1.69751 8
- : -2082085.85000 -1887847.12500 23.15789 1.65532 8
- : -2082085.85000 -1807847.12500 23.15990 1.61336 8
- : -2082085.85000 -1727847.12500 32.19845 1.55000 8
- : -2082085.85000 -1647847.12500 38.46202 1.49013 8
- : -2082085.85000 -1567847.12500 32.17017 1.45502 8
- : -2082085.85000 -1487847.12500 39.51803 1.40543 8
- : -2082085.85000 -1407847.12500 47.27689 1.37395 8
- :
- : Coming from Fortran, this should not be happening.
-
- Why not? The difference between real and double precision in Fortran is
- about the same as between float and double in C++.
-
- Borland's
- : documentation suggests that negative floats and doubles don't
- : exist.
-
- You are misreading it somehow.
-
- However, their web page has a technical document
- : describing how floating points are represented and indicate
- : that negative floats and doubles are OK. I am puzzled
- : by the behavior I describe above. And yes, I need access to
- : the X, Y and Z values as numbers for certain statistics I
- : need to include so I can't just read them as strings.
- : Any help is greatly Appreciated!
- :
-
- Floating point numbers are stored, at least on PCs, in binary scientific
- notation. The sign of the number is stored in one bit. The absolute value
- of the number is then expressed uniquely in the form s * 2^e (s times 2
- to the e power), where s >= 0.5 but s < 1, and e is an integer that can
- be positive, negative or zero. The values s (significand) and e (exponent)
- are stored in some set of bits.
-
- On a PC, a 32-bit float uses one bit for the sign, 8 bits for the
- exponent and the remaining 23 bits for the significand. The significand
- is expressed as a binary fractional expansion; the first digit is chopped
- off (it is always 1, since s is at least 1/2 but less than 1, so to save
- space and allow greater precision the 1 is assumed instead of being saved
- explicitly); and the next 23 binary places are stored. The exponent e is
- stored as a signed integer.
-
- The number -2127847.12500 can be expressed in binary scientific notation
- as (-).1[00000011101111110011100]1000... times 2 to the 22nd power. The bits
- between brackets are the 23 bits that would be saved in a float. What
- remains after the right bracket is the rounding error, which is
- approximately 1 in the 25th position after the point--that is, 2 to the
- negative 25th power times 2 to the 22nd power = 2 to the negative 3rd
- power = 1/8 = 0.125, exactly the amount by which your printed output was off.
-
- The number -2047847.12500 in binary scientific notation is
- (-).1[11110011111101100111001] EXACTLY. This value can be specified fully
- in the 24 binary places provided (23 stored plus one imputed), so no
- rounding error occurs.
-
- I haven't checked your other numbers individually, but this gives you the
- general idea.
-
- How many digits of precision does one get from 24 binary places (bits)?
- The rightmost bit in the significand represents 2^(-24), which is about 6
- times 10^(-8). If all the bits in the signficand are 1s, then the
- significand represents 1. Therefore, the precision of the signficand is
- equal to 6 parts per 100,000,000, allowing about seven signficant figures
- in decimal representation.
-
- The 8-bit exponent can vary from -127 to 128, so the magnitude of floats
- can range from 0.5 * 2^(-127) to 1.0 * 2^128. This is equivalent to a
- range from 10^(-38) to 10^38 in decimal terms.
-
- Doubles are stored in 64 bits on a PC: one bit for sign, 11 for exponent,
- and 52 for significand. The 52 bits allow for about 15 signficant digits.
- The 11 bits in the exponent allow for orders of magnitude from 10^(-308)
- to 10^308.
-
-
-
-
-
-
-